智能论文笔记

BDSL 49: A Comprehensive Dataset of Bangla Sign Language

Ayman Hasib , Saqib Sizan Khan , Jannatul Ferdous Eva , Mst. Nipa Khatun , Ashraful Haque , Nishat Shahrin , Rashik Rahman , Hasan Murad , Md. Rajibul Islam , Molla Rashied Hussein

分类：计算机视觉

2022-08-14

语言是个人表达思想的方法。每种语言都有自己的字母和数字字符集。人们可以通过口头或书面交流相互交流。但是，每种语言都有同类语言。聋哑和/或静音的个人通过手语交流。孟加拉语还具有手语，称为BDSL。数据集是关于孟加拉手册图像的。该系列包含49个单独的孟加拉字母图像。 BDSL49是一个数据集，由29,490张具有49个标签的图像组成。在数据收集期间，已经记录了14个不同成年人的图像，每个人都有不同的背景和外观。在准备过程中，已经使用了几种策略来消除数据集中的噪声。该数据集可免费提供给研究人员。他们可以使用机器学习，计算机视觉和深度学习技术开发自动化系统。此外，该数据集使用了两个模型。第一个是用于检测，而第二个是用于识别。

translated by 谷歌翻译

Fake Hilsa Fish Detection Using Machine Vision

Mirajul Islam , Jannatul Ferdous Ani , Abdur Rahman , Zakia Zaman

分类：计算机视觉 | 人工智能

2022-01-08

希尔萨是孟加拉国的国家鱼。孟加拉国通过出口这条鱼赚了很多外币。不幸的是，最近几天，一些肆无忌惮的商人正在销售假的HILSA鱼类来获得利润。沙丁鱼和撒丁岛是市场上最销售的希尔萨。孟加拉国政府机构，即孟加拉国食品安全管理局表示，这些假希腊鱼类含有高水平的镉和铅，这对人类有害。在这项研究中，我们提出了一种可以容易地识别原始HILSA鱼和假HILSA鱼的方法。基于在线文学上的研究，我们是第一个识别原始HILSA鱼的研究。我们收集了超过16,000个原装和假冒Hilsa鱼的图像。要对这些图像进行分类，我们使用了几种基于深度学习的模型。然后，在它们之间比较了性能。在这些模型中，Densenet201实现了97.02％的最高精度。

translated by 谷歌翻译

Deep Learning Based Classification System For Recognizing Local Spinach

Mirajul Islam , Nushrat Jahan Ria , Jannatul Ferdous Ani , Abu Kaisar Mohammad Masum , Sheikh Abujar , Syed Akhter Hossain

分类：计算机视觉 | 机器学习

2022-01-06

深度学习模型通过从训练的数据集学习来提供图像处理的令人难以置信的结果。菠菜是一种含有维生素和营养素的叶蔬菜。在我们的研究中，已经使用了一种可以自动识别菠菜的深度学习方法，并且该方法具有总共五种菠菜的数据集，其中包含3785个图像。四种卷积神经网络（CNN）模型用于对我们的菠菜进行分类。这些模型为图像分类提供更准确的结果。在应用这些模型之前，存在一些预处理图像数据。为了预处理数据，需要发生一些方法。那些是RGB转换，过滤，调整大小和重新划分和分类。应用这些方法后，图像数据被预处理并准备好在分类器算法中使用。这些分类器的准确性在98.68％至99.79％之间。在这些模型中，VGG16实现了99.79％的最高精度。

translated by 谷歌翻译

MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction

Jorge Quesada , Lakshmi Sathidevi , Ran Liu , Nauman Ahad , Joy M. Jackson , Mehdi Azabou , Jingyun Xiao , Christopher Liding , Matthew Jin , Carolina Urzay

分类：计算机视觉 | 机器学习

2023-01-01

There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .

translated by 谷歌翻译

Jamdani Motif Generation using Conditional GAN

MD Tanvir Rouf Shawon , Raihan Tanvir , Humaira Ferdous Shifa , Susmoy Kar , Mohammad Imrul Jubair

分类：计算机视觉

2022-12-22

Jamdani is the strikingly patterned textile heritage of Bangladesh. The exclusive geometric motifs woven on the fabric are the most attractive part of this craftsmanship having a remarkable influence on textile and fine art. In this paper, we have developed a technique based on the Generative Adversarial Network that can learn to generate entirely new Jamdani patterns from a collection of Jamdani motifs that we assembled, the newly formed motifs can mimic the appearance of the original designs. Users can input the skeleton of a desired pattern in terms of rough strokes and our system finalizes the input by generating the complete motif which follows the geometric structure of real Jamdani ones. To serve this purpose, we collected and preprocessed a dataset containing a large number of Jamdani motifs images from authentic sources via fieldwork and applied a state-of-the-art method called pix2pix to it. To the best of our knowledge, this dataset is currently the only available dataset of Jamdani motifs in digital format for computer vision research. Our experimental results of the pix2pix model on this dataset show satisfactory outputs of computer-generated images of Jamdani motifs and we believe that our work will open a new avenue for further research.

translated by 谷歌翻译

The Myth of Culturally Agnostic AI Models

Eva Cetinic

分类：人工智能

2022-11-28

The paper discusses the potential of large vision-language models as objects of interest for empirical cultural studies. Focusing on the comparative analysis of outputs from two popular text-to-image synthesis models, DALL-E 2 and Stable Diffusion, the paper tries to tackle the pros and cons of striving towards culturally agnostic vs. culturally specific AI models. The paper discusses several examples of memorization and bias in generated outputs which showcase the trade-off between risk mitigation and cultural specificity, as well as the overall impossibility of developing culturally agnostic models.

translated by 谷歌翻译

FLAIR #1: semantic segmentation and domain adaptation dataset

Anatol Garioud , Stéphane Peillet , Eva Bookjans , Sébastien Giordano , Boris Wattrelos

分类：计算机视觉

2022-11-23

The French National Institute of Geographical and Forest Information (IGN) has the mission to document and measure land-cover on French territory and provides referential geographical datasets, including high-resolution aerial images and topographic maps. The monitoring of land-cover plays a crucial role in land management and planning initiatives, which can have significant socio-economic and environmental impact. Together with remote sensing technologies, artificial intelligence (IA) promises to become a powerful tool in determining land-cover and its evolution. IGN is currently exploring the potential of IA in the production of high-resolution land cover maps. Notably, deep learning methods are employed to obtain a semantic segmentation of aerial images. However, territories as large as France imply heterogeneous contexts: variations in landscapes and image acquisition make it challenging to provide uniform, reliable and accurate results across all of France. The FLAIR-one dataset presented is part of the dataset currently used at IGN to establish the French national reference land cover map "Occupation du sol \`a grande \'echelle" (OCS- GE).

translated by 谷歌翻译

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Yuxin Fang , Wen Wang , Binhui Xie , Quan Sun , Ledell Wu , Xinggang Wang , Tiejun Huang , Xinlong Wang , Yue Cao

分类：计算机视觉 | 自然语言处理 | 机器学习

2022-11-14

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained to reconstruct the masked out image-text aligned vision features conditioned on visible image patches. Via this pretext task, we can efficiently scale up EVA to one billion parameters, and sets new records on a broad range of representative vision downstream tasks, such as image recognition, video action recognition, object detection, instance segmentation and semantic segmentation without heavy supervised training. Moreover, we observe quantitative changes in scaling EVA result in qualitative changes in transfer learning performance that are not present in other models. For instance, EVA takes a great leap in the challenging large vocabulary instance segmentation task: our model achieves almost the same state-of-the-art performance on LVISv1.0 dataset with over a thousand categories and COCO dataset with only eighty categories. Beyond a pure vision encoder, EVA can also serve as a vision-centric, multi-modal pivot to connect images and text. We find initializing the vision tower of a giant CLIP from EVA can greatly stabilize the training and outperform the training from scratch counterpart with much fewer samples and less compute, providing a new direction for scaling up and accelerating the costly training of multi-modal foundation models. To facilitate future research, we release all the code and models at https://github.com/baaivision/EVA.

translated by 谷歌翻译

Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification

Syeda Nyma Ferdous , Xin Li , Siwei Lyu

分类：计算机视觉

2022-09-19

物体重新识别（REID）是生物识别和监视系统中最重要的问题之一，在过去几十年来通过图像处理和计算机视觉社区进行了广泛的研究。学习强大而判别的特征表示是对象REID的关键挑战。在REID中，基于无人机（UAV）的REID更具挑战性，因为图像的特征是飞行无人机的摄像机参数（例如，视角，海拔等）的连续变化。为了应对这一挑战，已经考虑了多尺度特征表示形式来表征来自不同海拔无人机飞行的图像。在这项工作中，我们提出了一种多任务学习方法，该方法采用新的多尺度体系结构，无卷积，金字塔视觉变压器（PVT），作为基于无人机的对象REID的骨干。通过对类内变化的不确定性建模，我们提出的模型可以使用不确定性感知对象ID和相机ID信息共同优化。实验结果报告了Prai和VRAI，这是两个REID数据集，从空中监视中验证我们提出的方法的有效性

translated by 谷歌翻译

Ensemble uncertainty as a criterion for dataset expansion in distinct bone segmentation from upper-body CT images

Eva Schnider , Antal Huck , Mireille Toranelli , Georg Rauter , Azhar Zam , Magdalena Müller-Gerbl , Philippe Cattin

分类：计算机视觉

2022-08-19

目的：单个骨骼的本地化和细分是许多计划和导航应用程序中重要的预处理步骤。但是，如果手动完成，这是一项耗时和重复的任务。这不仅对于临床实践，而且对于获取培训数据都是正确的。因此，我们不仅提出了一种端到端学习的算法，该算法能够在上身CT中分割125个不同的骨骼，而且还提供了基于合奏的不确定性度量，有助于单张扫描以扩大训练数据集。方法我们使用受3D-UNET和完全监督培训启发的神经网络体系结构创建全自动的端到端学习细分。使用合奏和推理时间扩展改进结果。我们研究了合奏 - 不确定性与未标记的扫描的前瞻性用途，这是培训数据集的一部分。结果：我们的方法在16个上体CT扫描的内部数据集上进行评估，每个维度的分辨率为\ si {2} {\ milli \ meter}。考虑到我们标签集中的所有125个骨头，我们最成功的合奏中位数骰子得分系数为0.83。我们发现扫描的集合不确定性与其对扩大训练集中获得的准确性的前瞻性影响之间缺乏相关性。同时，我们表明集成不确定性与初始自动分割后需要手动校正的体素数量相关，从而最大程度地降低了最终确定新的地面真实分段所需的时间。结论：结合结合，集合不确定性低的扫描需要更少的注释时间，同时产生类似的未来DSC改进。因此，它们是扩大从CT扫描的上身不同骨分割的训练集的理想候选者。 }

translated by 谷歌翻译